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Abstract 

We study the spread of susceptible-infected-recovered (SIR) infectious 
diseases where an individual's infectiousness and probability of recovery 
depend on his/her "age" of infection. We focus first on early outbreak 
stages when stochastic effects dominate and show that epidemics tend 
to happen faster than deterministic calculations predict. If an outbreak 
is sufficiently large, stochastic effects are negligible and we modify the 
standard ordinary differential equation (ODE) model to accommodate 
age-of-infection effects. We avoid the use of partial differential equations 
which typically appear in related models. We introduce a "memoryless" 
ODE system which approximates the true solutions. Finally, we analyze 
the transition from the stochastic to the deterministic phase. 



1 Introduction 

Infectious diseases continue to impact public health. The previous emergence 
of SARS, the ongoing emergence of HlNl swine influenza, and the simmering 
threat of H5N1 avian influenza or other diseases call attention to the need to 
prepare for a quickly-spreading pandemic. Such a pandemic could have typi- 
cal infectious period measured in days or weeks, spread worldwide, and grow 
quickly. In the face of such an emerging disease, there is little time to develop 
and implement interventions. 

The ability to predict the timing and maximum patient load imposed by 
an epidemic is essential to intervention design. Overestimating the preparation 
time available or underestimating the peak may result in well-designed measures 
which are implemented too late or are too small. 

The ability of an infectious disease to spread depends strongly on the number 
of susceptible individuals S, and the total population size N. We will find that 
the details of the spread are more sensative to changes in N/S than changes in 
S/N, and so we will couch most of our discussion in terms of changes in N/S. 

Figure [l] shows the course of an epidemic of an infectious disease whose 
characteristics are discussed later (j ]3.2.2 with c = 0.9). At very early times 
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Figure 1: The course of an epidemic with vertical logscale (left). The cumulative 
age-of-infection distribution JJ" r') dr' at different times (center and right). 

the disease spreads as a branching process and stochastic effects are important. 
As the outbreak grows, the spread continues as a branching process, but the 
stochastic effects lose importance. However, the timing of the epidemic always 
feels the initial stochastic impact. 

We also consider the infection-age distribution i(t, r), the number of people 
infected at time t who have been infected for r units of time. We plot the 
cumulative distribution in figure [l] at small t (center) and larger t (right). At 
small t the distributions are noisy, and converge to a steady-state distribution 
as t increases. As the spread continues, N/S begins to change perceptibly and 
the steady-state adjusts adiabatically if N/S changes slowly enough. If N/S 
does not change slowly, the system cannot adjust to the changing equilibrium. 
During the growing phase of the epidemic, the infected individuals are weighted 
towards more recent infections, while during the declining phase the infected 
individuals have disproportionately older infections. 

We focus on several stages in this paper: the early stochastic phase, the later 
deterministic phase, and the transition phase between these two. If S is initially 
small, then N/S can change significantly during the stochastic phase. We do 
not address this case. 

Typically disease outbreaks are either subcritical (meaning TZq < 1) for 
which epidemics are impossible because an average infected person infects fewer 
than one individual, or supercritical (meaning TZq > 1) for which epidemics 
are possible. We consider only supercritical outbreaks. Early in an outbreak's 
spread, growth is dominated by stochastic effects, and it may die out stochasti- 
cally. If it persists, it may grow faster or slower than "average" . As long as N/S 
does not change significantly, the spread can be modeled using Crump-Mode- 
Jagers (CM J) processes [3 O [TH [TU] . A subcritical CM J process dies out, while 
a supercritical CM J process either dies out or converges to Ce'^* where C is a 
random value and (j) depends on the process. 

If a supercritical outbreak becomes sufficiently large the spread is effectively 
deterministic. The usual equations for this phase are the susceptible-infected- 
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recovered (SIR) equations 



S = -PIS/N (1) 
/ = (3IS/N - -fl (2) 
R^jl (3) 

These equations assume that infected people cause infections at rate (3 and re- 
cover at rate 7, giving an exponentially distributed infection duration. The 
process is "memory less" . In contrast, for real diseases the "age" of an individ- 
ual's infection affects his/her infectiousness and probability of recovering. 

Ignoring "age-of-infection" effects loses important details. During the growth 
of an epidemic the infections are biased towards young infection ages. If young 
infections are more (or less) infectious, the SIR equations under- (or over-) 
estimate the growth rate. Similar observations hold during decay. 

Several approaches have been developed to study age-of-infection models. 
Some explicitly track the history of the epidemic [H [lU |3l El [HI [3 120] . Others 
attempt to maintain the memoryless feature of equations ([lJ-([3]) by introducing 
a chain of infected compartments /i , . . . , /„ in order to approximate the infec- 
tious period distribution [T] [221 [TT) [3 EH [H] . These chains of compartments 
usually do not have biological meaning, but instead are a simplifying "trick". 
Typically these assume constant (3 and that each of n infected classes recovers 
at rate jn, resulting in gamma-distributed infectious periods. 

In this paper we investigate the growth of an epidemic from a single infection 
to a full-scale epidemic, without the restrictive assumptions underlying ([T])-([3]). 
In Sj2] we show how to model the early stochastic phase, and give some com- 
parison with deterministic predictions. In ^|3]we show how to find deterministic 
equations governing the epidemic's growth. We take a different approach from 
most previous studies and arrive at a system similar to the standard equa- 
tions ([l])-(|3| rather than a partial differential equation. If the change in N/S is 
not large during a typical infectious period, we can approximate the infectious 
population as being in equilibrium given N/ S and arrive at a memoryless sys- 
tem that captures the dynamics well. In f|4]we examine what it means for the 
outbreak to be large enough to be deterministic. 



2 Stochastic Phase 

We assume that the disease spreads from individual to individual in such a way 
that the abihty of individual u to infect a susceptible individual depends only on 
how long u has been infected and whether or not u has recovered. We let P(r) 
be the probability u is still infected r units of time after becoming infected. If 
u is still infected, the rate u causes new infections is P{t)S/N. This enforces a 
possibly unrealistic assumption that infectiousness is independent of infection 
duration. It would be straightforward to modify the model to incorporate this 
effect, but we do not do it here. 
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Figure 2: The probability of having 0, 5, or 20 people infected as functions of 
time beginning with a single index case: comparison of theory (dashed) and 
50000 simulations. 



We have P{0) — 1 and — assuming no-one remains infectious forever — 
P{oo) = 0. We assume P is differentiable. The probability of recovering in a 
short interval {t,t + At) is -P'(t) Ar + ©(Ar^). We let Predr) be the rate at 
which recovery happens: Prec{T) = —P'{t) > 0. 



2.1 The equations 

We have full derivations of the equations in appcndix|A] If pk (t) is the probability 
that k individuals are infected at time t, then the probability generating function 
(pgf) f{x, t) — J2'k'=oPk{t)x'^ provides a useful tool to help calculate pk- We get 



fix,t) = xP{t)g{x,t\t) 



I g{x,t\T)Prec{T)AT (4) 

Jo 



Here g{x,t\T) — 'Y^ qk{t\T)x*' is the pgf for the number of descendants an indi- 
vidual has t units of time after its infection given that it recovers t <t units of 
time after infection. That is qk{t\T) is the probability an individual has k infec- 
tious descendants t units of time after becoming infected given that it recovers 
after t < t units of time. 
We find (for t <t) 

g{x,t\T)^exp( f [f{x,t-e)-m9)d0) (5) 







To find equations for pk{t) we take the k-th derivative of /, divide by fc!, and 
evaluate at x = 0. We solve the equations as described in appendix [B] 

We compare the solutions with 50 000 simulations in figure |2] We take the 
probability distribution function (pdf) of the infection duration to be a WeibuU 
distribution, 1^(5.8,2.59), so P(t) = e-*^/^-®)' . We take constant /3 = 2. 
Although there is considerable noise in simulations, we find close match with 
analytic results. 



2.2 Asymptotic behavior at large / 

If 5(0) is large, then N/S may still be approximately constant even as / becomes 
much larger than 1 . We are interested in the behavior of / as it becomes large, 
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but before N/S has changed significantly. If we assume N/ 5=1 remains fixed, 
then under weak assumptions it can be shown [71 E] that I{t) either becomes 
zero at some finite time or it converges to Ce^* where C is a random number 
determined by stochastic effects and solves 



This equation is the Euler-Lotka (EL) equation, which we derive in |3] The 
solution (j) is unique and known as the Malthusian parameter. The most signif- 
icant assumption we require for this convergence is that the infection is not a 
"lattice" process, that is, possible times of infection are not discretized and so I 
can change change continuously This result guarantees that if the susceptible 
population is sufficiently large, the outbreak either dies out or becomes large 
enough that the growth is deterministic. 

We have shown that equations Q and ([s]) accurately predict the probability 
of having a given number of infections as a function of time. Once the outbreak 
is sufficiently large, the impact of stochastic effects is reduced and the infected 
population size scales like Ce''^* for fixed (j). The random value of C determines 
how much time is available to prepare for the epidemic. 

2.3 Distribution of epidemic onset times 

We turn to a simpler disease process to investigate the impact of the stochas- 
tic phase on how quickly an epidemic "takes off" . We consider a population 
with constant infectiousness and exponentially distributed infection durations 
(corresponding to a constant recovery rate). We compare predictions from the 
stochastic model with predictions from the deterministic equations ([l])~([3| which 
are exactly valid precisely for this infection process. We take (3 — 1.5 and 7=1. 

Figure |3] shows that if the initial number of infections is low, it is relatively 
likely that the number infected becomes large before the deterministic equations 
predict it should. This has a number of implications for interpreting early stages 
of an outbreak. If we attempt to predict the present size of an outbreak given a 
known introduction date using the assumption of deterministic growth, we are 
likely to underpredict the current size. Consequently if we make preparations to 
introduce interventions under the assumption of deterministic growth, we may 
be using interventions that are too small and implemented too late. 

The mismatch decreases as the initial number of infections increase. We 
explain this observation by noting that outbreaks with only a few infections 
grow on average at the deterministically predicted rate. However, those at the 
lower range of growth often go extinct, while those at the higher range tend to 
become epidemics quickly. This leads to the important conclusion that if an 
epidemic happens, it is likely to happen faster than the deterministic equations 
predict. 

^For lattice processes, similar results apply with discrete rather than continuous time. 
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Figure 3: A comparison of the deterministically predicted time at which 1000 
individuals are infected (vertical dashed lines) with the actual probabilities (solid 
curves) of having 1000 individuals infected at each time given different numbers 
of initial infections. 

3 Deterministic Phase 

In this section we develop the deterministic equations governing epidemics once 
stochastic effects are unimportant. Our exact equations are equivalent to many 
previous age-of-infection models [H [TTJ |3J HI [TJl |5J HD] , but we avoid the use 
of PDEs which usually arise. A related approach also avoiding PDEs was used 
by [3], but we cast our equations in a form similar to the standard SIR equa- 
tions ([l])-([3]). We then introduce an approximation to these equations. We 
discuss the transition from the stochastic phase to the deterministic phase in 



In the stochastic phase analysis, we assumed that infectiousness is indepen- 
dent of the recovery time (except that after recovery infectiousness is zero) . We 
can drop this assumption here and redefine /3(r) as the average rate of infection 
T units of time after infection for those individuals still infected (of which a 
fraction S/N are successful). The product (5{t)P{t) represents the expected 
rate of new infections caused by an individual u infected r units of time pre- 
viously, where the expectation is taken without prior knowledge of whether u 
has recovered. We normalize this by TZq = (3{t)P{t) dr to arrive at the 
generation interval distribution P{t)P{t)/TZo [T9 | |2T | . 

Let b{t) denote the rate of new infections occurring at time t and d{t) the 
rate of recoveries. Let z(t, r) denote the number of people who became infected 
at time t — t and are still infected at time t. Then i{t,T) = b{t — t)P{t). We 
can find b in terms of i by b{t) = i{t, T){S/N)f3{T) dr and d in terms of b by 



If N/S is constant the age-of-infection distribution converges to a steady- 
state where i{t, r) /I{t) is independent of t. The population size grows or decays 
exponentially, so b{t) = Ce**^ where (j) solves the modified EL equation 



SI 



d{t) = J^b{t-T)Precir)dT. 
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This has been used at early times [2T] when N/S « 1 to relate the exponential 
growth in time </> with TZq. 

We use the constant N/S solution as the basis for our approach with chang- 
ing N/S. We take b{t) = Ce«(*) 



Rearrangement gives 



N 



(3{t)P{t) dr 



p?(t) 



N 



I3{t)P{t) dr 



We derive equations for I and 5* in terms of ^ as follows: The derivative 
of S is -b{t) ^ -Ce«(*). We multiply by 1 = // t) dr, using i(t,r) = 

bit ~ t)P{t) = Ce«(*-^)P(r) to get 



where g[£„ t] = e«(*"^)P(r) dr. Repeating this for / = b{t) - d{t) we get 
f I H[^,t] T[^,t] IS H[^,t] 

g[(,t] g[^,t] g[^,t] n g[(,t] 

where n[^,t] = e«(*-^)P^ec(T) dr. This can be written in a similar form 
to the standard SIR equations, except that the coefficients change in time and 
depend on the history of the epidemic 



s = -m 



IS^ 



R 



N 



s 



(6) 

(7) 
(8) 
(9) 



where i3{t) = T[£^,t]/g[S^,t] and ^{t) — 'H[^,t]/t/[^,i]. Because of the similarity 
in notation, we distinguish (3{t) to be the average rate of causing infection of 
all individuals infected at time t, while /3(r) is the average rate of causing 
infection by an individual still infected r units of time after becoming infected. 
To initialize the problem we need ^(i) for all t < as well as 5(0) and /(O). 
Typically we will assume that ^(t) — — oo for t < so that e^'^*-' — 0. As we solve 
forward, new values of ^ are calculated based on the change in S. The history 
of ^(i — r) for r > encodes all information needed about the age-of-infection 
distribution at i. A less intuitive, but simpler formulation of these equations 
appears in appendix [C] 
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3.1 Approximating the solution 

Storing the history of an outbreak introduces some mild analytical and compu- 
tational difficulties. It is convenient to work with a system that depends only 
on its current state. If N/S varies slowly relative to how quickly ^ changes, 
we can assume that the system responds adiabatically to changes in N/S and 
so the age-of-infection distribution is at equilibrium with the current value of 
N/S. This assumption will allow us to create equations analagous to ([l|-(|3| 
with changing coefficients, which may be solved by standard ODE methods. 
This approach will break down if N/S changes significantly during a typical 
infectious period. Fortunately, we can use the results of the approximation to 
identify when the approximation fails. 

We replace ^(t—r) by ^(t)— /J" 4>{t—9) dO where (j){t) = ^'{t) and approximate 
J-/e^, G/e^, and Ti/e^ by F{(j)), G{(f)), and H{(j)) respectively assuming that 
(j)(t — t) ~ <j}{t) for the range of r which make a significant contribution to the 
integral. 



F{(f) = / e-"^W/3(T)P(r)dr 
Jo 

/>oo 

Jo 



Note that each of these is a Laplace transform. The resulting approximating 
equations are 

s^-m^ (10) 

TQ 

I^(3o{t)--%{t)I (11) 
R = %{t)I (12) 

= W) ^''^ 

where po{t) = F{4>)/G{4') and %{t) = iJ(0)/G(0). 

Computationally this system of equations is only mildly more difficult than 
the standard SIR equations. We can either find the functional forms of the 
Laplace transforms, or simply calculate them for various (p in advance. Once 
that is done, then at each time step, we need only look at N/S, identify (j) such 
that F{(f)) — N/S, and then find G and H. Then the integration proceeds as in 
the standard SIR equations. 

The approximation is valid as long as the amount of change of N/S during 
a typical infectious period is small, and is therefore valid well into the nonlinear 
regime after the exponential growth phase has ended. 
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Figure 4: Comparison of simulations with exact age-of-infection model, approx- 
imation, and two parametrizations of the SIR equations. The temporal shift of 
the exact and approximate solutions is a result of difference in initial condition. 
The exact solution takes the initial condition that £^{t) = for t < while the 
approximate solution assumes that £^{t) ~ t(j){0) for t < 0. 



3.2 Examples 

3.2.1 The usual suspects 

If we make the usual assumptions of constant infectiousness and exponentially 
distributed recovery time [f3 constant and Prec{T) = 76"''"^] the system is mem- 
oryless. The function ^ encodes the age-of-infection distribution, which is ir- 
relevant in a memoryless system. Thus the equations for / and S should not 
depend on ^. We find t] = (30 t], and so S = -filS/N. We similarly find 
, — 7 and so / = (3IS/N — 7/. So in this special case the exact 
age-of-infection model ([6|-([9| reduces to the standard SIR equations ([T])~([3]). 



This holds even for our approximate system ( 10 )-( 13 1 




3.2.2 A piecewise continuous example 

We take 

'c < r < 1 or 2 < T < 3 

~ ~ ~ ~ (14) 

otherwise 

1/2 1<.<3 ^^^^ 
otherwise 

So people are initially infectious, then stop being infectious at r = 1 and begin 
to recover. At r = 2, they continue recovering, but become infectious once more. 
By r = 3 all individuals have recovered. Such a system could model a disease in 
which individuals are infectious before and possibly after having symptoms, but 
self-isolate during the symptomatic phase. The generation interval distribution 
is given by 



7^n 



5 2 < r < 3 (16) 
otherwise 



In figure |4] we find that the exact model ([Gjl-Q fits the simulations well 
(with the discrepancy due to stochastic shifts in time) . The difference in timing 
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Figure 5: The standard SIR equations cannot closely capture the dynamics of 
the disease spread, regardless of whether we preserve the average duration of 
infection or the average generation interval. 






Figure 6: For gamma distributed recovery time with constant infectiousness, the 
exact system differs from simulations only in time shifts. The approximation 
closely matches the initial growth phase, but begins deviating close to the peak. 



between the exact and approximate solution (10l-(13l is due to differences in 
initial conditions: the exact calculation assumes a single infection beginning at 
t = while the approximate solution assumes that the epidemic begins with 
the equilibrium age-distribution already reached by t = 0. The approximate 
model is a good fit for the behavior at early times and remains a good ap- 
proximation until the change in N/S becomes significant over the duration of 
an infection. The approximation performs best in those situations where the 
number of infections remains smaller. 

If we attempt to approximate the epidemic course using the standard SIR 
model ([T|)-([3]), then we have two free parameters (3 and 7. We can identify 
(at least) three constraints: TZq, average duration of infection, and average 
generation interval. We can only match two of these at a time, which we show 
in figure [5] If we choose to match TZq and average duration of infection then the 
total number of infected person-days is correct, but the timing is far off. If we 
choose to match TZq and average generation interval, then the timing is much 
closer, but the peak patient load is significantly underestimated. 



3.2.3 Gamma-distributed recovery times 

Recently |22j investigated some of the role the distribution of infection duration 
has on the dynamics of an epidemic. They considered a gamma-distributed 
infectious period with constant infectiousness. The model they studied corre- 
sponds to a chain of 100 exponentially distributed infectious classes, each with 
infectiousness (3 and expected duration 1/100. They showed that the standard 
SIR equations ([lj-(|3| provide a poor approximation. 
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For this system, Prec{T) — t"~^ exp(— nT)n"'/(n — 1)! where n = 100. The 
Laplace transform of this is (l + ^/n)~". From this we can derive the transforms 
of P and (3P, which allows us to define the coefficients for our approximation. 
Figure|6]shows that the approximation closely follows the early growth even after 
the exponential phase ends. It finally deviates close to the epidemic peak, but 
it gives a reasonable estimate of the timing and maximum load of the epidemic. 



4 Transition Phase 

We have shown that stochastic effects play an important role on whether an 
epidemic occurs and the timing of an epidemic if it does occur. We have also 
seen that once the epidemic is sufficiently large, it follows the deterministic 
predictions. We borrow an approach from [8] to identify when the transition 
from the stochastic phase to the deterministic phase occurs. For simplicity in 
our analysis, we will assume that the process is not highly peaked. This allows 
us to assume that i(t,r)//(t) is close to its equilibrium state. 

In order to treat the dynamics as deterministic over a time interval At, we 
must satisfy two competing conditions. First, we need the time interval to be 
large enough that the number of infections and recoveries that happen in that 
interval is well-approximated by the expected number. That is, we need the 
expected error to be small compared to the expected value, and so the coefficient 
of variation (the square root of the variance divided by the expectation) is 
small. Assuming that the rates remain constant, the infection and recovery 
processes are both Poisson, and so their difference is a Skellam distribution, 
which has variance /At(/3 + 7) [TH] [13]. Consequently the condition we need is 



that ^IAt{f3 + j)/IAt\(3-j\ < 1. So 

At » ^ + ^ (17) 

/(/3-7)2 

Second, we need the time interval to be small enough that the rate at which the 
infectious population size changes is not affected by changes in the infectious 
population. That is we need AI « (/? — 7)/Ai <C /. So 

At < — (18) 



For small values of /, conditions ( 17 ) and ( 18 1 cannot be satisfied simultaneously. 



Combining these conditions we need that 

P + 1 



1/9-71 

More strictly, we actually require that \/l (/3 + 7)/|/3 — 7I. 

The analysis we have done does not apply close to the peak of the epidemic 
(where /3 = 7). Here we can replace condition (|17[) with the requirement that 
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the error in the number of new infections is small compared to the number of 
new infections and similarly for the number of recoveries. In general we need 
condition (18 1 combined with either this pair of conditions or condition (17) to 
guarantee that the deterministic equations apply. For practical purposes, once 
the deterministic equations hold, we expect them to hold through the peak until 
I decays at which point we can use (17 1 again. 

If the generation interval distribution were highly peaked around some typ- 
ical time, then we could still argue that the system is deterministic, but we 
would have to explicitly set the history of ^ rather than assuming it is takes 
the equilibrium form. By assuming the equilibrium distribution we can treat 
infections as occurring at a slowly changing rate. 



5 Discussion 

A typical disease outbreak begins small and whether it grows or becomes extinct 
is strongly influenced by stochastic effects. If it grows, it generally does so faster 
than predicted deterministically because those outbreaks which are most likely 
to not die out stochastically are those which initially grow faster than average. 
Consequently if we observe an epidemic, it is likely to have grown to an epidemic 
faster than deterministic equations predict. 

Once an outbreak becomes large, it transitions to a deterministic phase. We 
can estimate the size an outbreak must reach to be deterministic by identifying a 
time interval which is large enough that many events happen in the interval (and 
so the error of a deterministic prediction is small compared to the prediction), 
while at the same time the interval is small enough that the size of / and S do 
not change significantly. Such a time interval can only exist if / is sufficiently 
large. 

Once an outbreak is deterministic, we can use the deterministic equations 
to accurately model the spread once a correcting time shift is apphed. These 
equations are somewhat difficult because they require saving the history of an 
epidemic, and so it may be more convenient to use approximate models. We have 
introduced an approximate model based on standard compartmental models. 
We assume that the system responds adiabatically to changes in the susceptible 
fraction. It uses a single infectious class, but has coefficients that change in 
time. It provides a good estimate of the early behavior, but may deviate close 
to the peak. We can estimate when it deviates by looking at how quickly the 
susceptible fraction changes during a typical infectious period. 

We have assumed throughout that the infectious population can be modeled 
in continuous time. If the generation interval is discrete, then these assump- 
tions fail, but similar approaches work in discrete time. A more complicated 
situation arises when the generation interval distribution is close to discrete: If 
the distribution is tightly peaked about a mean which is sufficiently far from 
zero, then it may take many generations for the infectious population to reach 
equilibrium. The population may become deterministic before the population 
reaches equilibrium, in which case our exact equations will provide a good model 
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(assuming appropriate initial conditions) while our first approximation may fail 
badly. Our second approximation may require a long chain of infectious classes 
in order to reproduce the correct dynamics. 

The models we have developed are straightforward to adapt to SIR with 
birth or death, SIS, or SIRS. In fact, such situations will be more amenable to 
our first approximating method because the rate of change of N/S is reduced. 
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A Probability Generating Functions 

A probability generating function (pgf) is a function f{x) which encodes a prob- 
ability distribution of non-negative integers (23] . Given that the probability of 
k is pk we define the function 

f{x) = po + Pix'^ + P2x'^ H 

Probability generating functions have a number of useful properties. The prod- 
uct of two pgfs is itself a pgf for the sum of two numbers chosen from each 
distribution. From this fact, it can be shown that for two pgfs / and g encoding 
the distributions Pg and Pf respectively, the function f{g{x)) is the pgf for the 
distribution found by choosing a random number s from Pf, and then taking 
the sum of s random numbers from Pg. 

This property of function composition is useful in our context to deal with 
taking a random number of infecteds (corresponding to Pf), and each of them 
infects a random number of susceptibles (from a distribution Pg). The resulting 
number of new infections is given by the composition of the corresponding pgfs. 

A.l derivation of equations 

We assume that the population is sufficiently large relative to the number of 
infections, that no infections are prevented by depletion of susceptibles. We 
focus our attention on a single infected individual u and its descendants. We can 
assume that t = when u becomes infected. Let f{x,t) be the time-dependent 
pgf for the number of individuals (descended from u, including u) who are 
infected at t. That is f{x,t) = Y^^=oPn{i)x^ where Pn{t) is the probability 
that n individuals are infected at time t. 
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Let g{x,t\T) be the pgf for the number of infectious descendants u has t 
units of time after becoming infected given that its infection lasts r units of 
time. Note that if t > t, then g{x,t\T) = g{x,t\t). Then the number of current 
infections is given by a weighted average of the number of descendants (plus 1 
if u is still infectious). Encoding this as a statement for pgfs gives 

f{x,t) = xP{t)g{x,t\t) + / g{x,t\ (t) dr 

Jo 

The number of infections resulting from an individual v infected at time 6 has 
pgf f{x, t — 9). This allows us to express g in terms of /. 

To find g, we consider an individual who recovers at time t and divide the 
duration of infectiousness into small sized blocks. The pgf for the number 
of infections at time t due to an infection that occurs in the interval \9, + A^) 
is f{x,t-e) + 0{M). The infection occurs with probability 0{e)M + 0{M'^). 
The probability that infection does not occur during that time period is 1 — 
l3{6)A9 + 0{A9^). Consequently the pgf for the number of infections at time t 
resulting from infections in the time interval of interest is: 

1 + [f{x, t-9)- + 0{M'^) 

The pgf for the number of infections occurring in any of the time intervals is the 
product of the individual generating functions. Consequently, taking A0 0, 
the pgf for the number of descendants an individual has at time t given that it 
recovers at r < t is 

g{x,t\T)= lim Wl + [f{x,t-iA9)-l]l3{iM)M + 0{M'^) 

i=0 

r/AS 

= lim exp( ln(l + [f{x, t - iM) - l]p{iA9)A9 + 0{A6'^) 

i=0 
(r/AS 

= lim exp [f{x,t- iA9) - l](3{iA9)A9 + 0{Ae^) 

y i=o 

= e^p(^£[f{x,t-9)-l]md0 

If the individual recovers at time t > t, then the pgf for the number of descen- 
dants at time t including itself satisfies g{x,t\T) = xg{x,t\t). 

This expression for g can be derived alternately by considering a large pop- 
ulation size N and noting that if the expected number of infections caused 
by ?; is r = (3{9) d9, then the probability of infecting each individual is 
p = P{9)/N The probability of infecting n people is then - p)^"". 

From this we can derive the pgf for the number of infections caused directly 
from V, and then using function composition will arrive at the same expression. 
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B Notes on the numerics for the stochastic prob- 
lem 



We take f{x,t) = J2Pk{t)x'' and g{x,t\T) — J2lk{t\T)x'' where pk gives the 
probabihty of having k people infected at time t, while q^. gives the probability 
of having k descendants given that recovery occurs at time r. If we take k 
derivatives of these equations, divide by k\ and evaluate at a; = 0, we get the 
probability of k infections. The resulting system of equations is straightforward 
to solve numerically. As our initial condition at t = we generally set all 
derivatives of / to be except the first derivative, which is 1, though other 
options are possible. 

If we make a simplifying assumption that /3 is constant, we can find an 
expression for g which reduces the dimenionality of the problem. We have 

[ [f{x,t-9)-l](3de^ P ~T+ [ f{x,t-6)de- [ f{x,t-9)d9 

Jo L "'0 Jt 

= -/3T + P[ff{x,9)d9- f ^ f{x,9)d9] 
Jo Jo 

We define the auxiliary function (^{x, s) = f{x, 9) d9. Then 

g[x, t\T) = exp /3[C(a;, t) - C{x, t ~ t) - t] 
Our equation for / remains 

f{x,t)^xP{t)g{x,t\t)+ / g{x,t\T) 

Jo 

This allows us to simplify the calculations by storing C, at each value of s rather 
than needing to integrate / at each time step. 

In practice we want to find arbitrary derivatives of / evaluated at x = 0. To 
find this numerically, we differentiate these equations with respect to x to arrive 
at equations coupling derivatives of f{x, t) with derivatives of ^ at x = 0. Let 
us assume we know C(0, s) and its derivatives for s = 0, di, 2dt, t and /(O, t) 
and its derivatives. To find ^(0, t + dt) and /(O, t + dt), it is straightforward to 
use an implicit numerical method. 



C An equivalent formulation 

Although equations (|6])-([9| are intuitively appealing because of their similarity 
to the standard SIR equations, we can reduce them to a simpler form. We first 
replace e«(*) with Note that S = -b{t) = -C'ip{t). Further G = I{t)/C, 

so from the initial condition at f = 0, we can calculate C, and have no further 
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need for g. Thus we arrive at 

S = -Ci){t) (19) 

poo 

i = C^{t) -C V(t - r)Prec{T) dr (20) 

^0 

POO 

R = C iP{t-T)Prec{T)dT (21) 



N 

J^[ij,t] = -i;{t) (22) 

If we take as the initial condition that all infections at time t = begin their 

infection period at t = 0, then il;{t — T) = for t > t and wc can assume that the 
integrals have their upper limit at t = t. If we take some other initial condition, 
we may have to include the entire range of r. Although these equations are 
simpler to solve, they lose some of their intuitive appeal because it is more 
difficult to identify the meaning of each term. 
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